Modeling interactions

Scarr-Rowe

We are going to use the Scarr-Rowe effect (or hypothesis) to look more closely on interaction effects. The Scarr-Rowe hypothesis states that the (genetic) heritability of a trait depends on the environment, such that the effects of genes are larger when environments are better. The underlying idea is that if everyone lives in a perfect environment, i.e. there is no variation in the relevant environment, then a trait will only depend on genes.

This interaction can be visualized as follows:

par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
heritability = c(scarce = 0.5, plentiful = 0.8)
barplot(heritability, ylab = "heritability", ylim = c(0,1), xlab = "environment")

Here is a DAG that describes such a model, where

  • A = additive genetic effects
  • C = common effects in a family
  • E = idiosyncratic effects and measurement error

and the Scarr-Rowe effect means the the coefficient of the path \(\small A \rightarrow IQ\) depends on \(\small SES\).

library(dagitty) 
dag = dagitty(
  "dag{
  IQ;
  A -> IQ;
  C -> IQ;
  E -> IQ;
  SES -> IQ;
  }")
coord.list = 
  list(
    x=c(A=1,C=2,E=3,SES=2,IQ=2),
    y=c(A=-1,C=-1,E=-1,SES=1,IQ=0))
coordinates(dag) = coord.list
drawdag(dag, cex = 1.5)

Not that DAGs do not encode interaction effects by drawing an arrow from the moderator to the relevant path. Instead, there is a path from the moderator to the outcome variable. So the DAG only tells us that IQ is a function of four other variables \(\small IQ = f(A,C,E,SES)\), but it does not tell us what the function \(f()\) is. This could be \(\small IQ = A + C + E + SES\), \(\small IQ = A \cdot C \cdot E + SES\) or \(\small IQ = A \cdot SES + C \cdot SES + E\), or \(\small IQ = A \cdot SES + log(C) \cdot SES + E\) or any imaginable function that uses the variables.

Lets simulate data that show the the Scarr-Rowe effect, first our exogenous variables. To keep things simple, we assume that all variables are normally distributed:

set.seed(3)
N = 250
SES = rnorm(N,mean = 5,1)
A = rnorm(N,mean = 10,2)
C = rnorm(N,mean = 10,2)
E = rnorm(N, sd = 7)

Now comes the interesting part: Scarr-Rowe assumes that the effect or weight of genes depends on the SES. So we formulate the weight of genes as as a function of SES. For good measure, we also let the effect of the familial environment vary by SES.

library(boot)
# we use inv.logit keep
# weights is a range 0-1
b_A = function(SES) 1 + inv.logit(SES-5)*3
b_C = function(SES) inv.logit(-SES+5)

We are literally defining the weight for A as a function of SES. This is one way to understand interactions. Here is a visualization of the weights:

par(mar=c(3,3,.5,.5), mgp=c(1.75,.5,0), tck=-.01)
curve(b_A(x),2,8, y = expression("effect size"~beta), 
      xlab = "SES", ylim = c(0,4), col = "red")
curve(b_C(x),2,8, xlab = "SES", add = T, col = "blue")
text(c(5,5),
     c(b_A(5), b_C(5)),
     labels = c(expression(beta[A]),
                expression(beta[C])), pos = 3)

In this plot the interaction is not encoded by the fact that \(\small \beta_A\) and \(\small \beta_C\) have different slopes, but by the fact that both have a slope in the first place. The figure shows two interactions, because the effect size for \(\small A\) and \(small C\) changes with SES.

And now we can simulate IQ values from using exogenous variables and derived weights. We assume that IQ depends on familial factors \(\small C\), additive genetic effects \(\small A\) and \(\small SES\):

IQ = b_C(SES)*C + b_A(SES)*A + 0*SES
IQ = 65 + IQ + E

If we just look at the bivariate associations between \(\small A\) or \(\small SES\) and \(\small IQ\), we get the following plot:

par(mfrow = c(1,2),mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(SES,IQ, pch = 16, cex = .5,)
abline(lm(IQ~SES))
plot(A,IQ, pch = 16, cex = .5,)
abline(lm(IQ~A))

# plot(C,IQ)
# abline(lm(IQ~C))

To run a simple interaction with a categorical interaction variable we use only a subset of our sample, the top and lower 20%, and code a categorical SES variables SES.c:

low.SES = which(SES < quantile(SES,.30))
high.SES = which(SES > quantile(SES,.70))
idx = c(low.SES,high.SES)
dt = data.frame(A = A[idx],
                C = C[idx],
                IQ = IQ[idx],
                SES = SES[idx],
                SES.c = c(rep(1,length(low.SES)),
                          rep(2,length(high.SES))))

Lets plot the association between \(\small A\) and \(\small IQ\) again, this time split by SES:

low.SES = dt$SES.c == 1
high.SES = dt$SES.c == 2
plot_data = function() {
  par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
  plot(IQ ~ A, data = dt, col = dt$SES.c, pch = 16, cex = .5,
       ylim = range(dt$IQ), ylab = "IQ", xlab = "A")
  legend("topleft", pch = 16, col = c("black","red"),
         legend = c("low SES","high SES"), bty = "n")
  
}
plot_data()
abline(lm(IQ ~ A, data = dt[low.SES,]), col = "black")
abline(lm(IQ ~ A, data = dt[high.SES,]), col = "red")

Simple linear regression

We start with a simple linear regression and build successively more complex model. But first some intuitions on priors:

  • Eyeballing the data, we see that the IQ at an average \(\small A\) of 5 is around 100, so we use a prior a ~ dnorm(100,5)
  • The range of \(\small A\) is 8-2 = 6 and the range of IQ = 130-80 = 50. So for a one unit increase of \(\small A\), IQ changes around 50/6 = 8. If we want that an effect of +/-8 is at 1 sd of the pior, we set the prior for the effect of \(\small A\) to a ~ dnorm(0,9). This prior allows for the possibility of a negative effect of \(\small A\). Note that for this prior to work
  • Lacking a strong intuition for the error variance, we set the prior for the variance to a generous dexp(0.25)
IQ.A = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a + b*(A - 10),
      a ~ dnorm(100,5),
      b ~ dnorm(0,2),
      sigma ~ dexp(.5)
    ),
    data=dt)

Lets quickly look at the prior predictions to make sure the piors are OK.

prior <- extract.prior(IQ.A)
A.seq <- seq(from=floor(min(A)),to=ceiling(max(A)),length.out=30)
mu <- link(IQ.A,post=prior,data=data.frame(A=A.seq))
par(mar=c(3,3,1,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(0,type = "n", xlim = c(min(A),max(A)), ylim = c(70,130), ylab = "IQ", xlab = "A")
matlines(A.seq,t(mu[1:100,]),type = "l", col = adjustcolor("blue", alpha = .5), lty = 1)

Yes, this looks good.

Here are the posterior predictions:

plot_data()
plot_mu.CIs = function(q.fit,data,col = "black", spaghetti = FALSE) {
  mu = link(q.fit,data=data)
  lines(A.seq, colMeans(mu), col =  col)
  if (spaghetti == F) {
    CIs = apply(mu, 2, PI)
    shade(CIs,A.seq,col = adjustcolor(col,alpha = .25))
  } else {
    matlines(A.seq,t(mu[1:250,]),col = adjustcolor(col,alpha = .05),lty = 1)
  }
  
}
plot_mu.CIs(IQ.A, data.frame(A=A.seq), "blue", spaghetti = TRUE)

Linear regression with category-main effect

To fit a model with a main effect of education, we use the indexing approach:

set.seed(1)
IQ.A_SES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a[SES.c] + b*(A - 10),
      a[SES.c] ~ dnorm(100,5),
      b ~ dnorm(0,4),
      sigma ~ dexp(.5)
    ),
    data=dt)
plot_data()
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 1), "black")
plot_mu.CIs(IQ.A_SES, data.frame(A=A.seq, SES.c = 2), "red")

We can see already from the plot the the model with separate means for high and low SES is better. But here is the comparison with PSIS-LOO:

compare(
  IQ.A,IQ.A_SES,
  func = "PSIS") %>% 
  round(2)
##             PSIS    SE dPSIS  dSE pPSIS weight
## IQ.A_SES 1014.22 14.63  0.00   NA  4.08      1
## IQ.A     1077.74 15.49 63.52 12.5  3.09      0

Linear regression with category-main effect and category-slope

Lastly, we can also let the slopes vary by SES:

IQ.AxSES = 
  quap(
    alist(
      IQ ~ dnorm(mu,sigma),
      mu <- a[SES.c] + b[SES.c]*(A - 10),
      a[SES.c] ~ dnorm(100,5),
      b[SES.c] ~ dnorm(0,2),
      sigma ~ dexp(.5)
    ),
    data=dt)

And again the posterior predictions:

par(mfrow = c(1,2), mar=c(3,3,2,.5), mgp=c(1.75,.5,0), tck=-.01)
plot(IQ ~ A, data = dt[dt$SES.c == 1,], pch = 16, cex = .5, main = "low SES",
       ylim = range(dt$IQ), ylab = "IQ", xlab = "A")
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 1), "black")
plot(IQ ~ A, data = dt[dt$SES.c == 2,], col = "red", pch = 16, cex = .5, main = "high SES",
       ylim = range(dt$IQ), ylab = "IQ", xlab = "A")
plot_mu.CIs(IQ.AxSES, data.frame(A=A.seq, SES.c = 2), "red")

compare(
  IQ.A, IQ.A_SES, IQ.AxSES,
  func = "PSIS") %>% 
  round(2)
##             PSIS    SE dPSIS   dSE pPSIS weight
## IQ.AxSES 1007.89 16.18  0.00    NA  5.23   0.97
## IQ.A_SES 1014.74 14.70  6.85  6.87  4.31   0.03
## IQ.A     1077.94 15.58 70.05 13.94  3.18   0.00

We get a warning about Pareto k values being higher than 0.5, but it is (more or less) OK if k is not larger than 0.7 or 1.

So how can we figure out if the difference in slopes is reliably larger than 0? We simply extract posteriors and make some calculations with them:

post = extract.samples(IQ.AxSES)
names(post)
## [1] "sigma" "a"     "b"
head(post$b)
##           [,1]     [,2]
## [1,] 1.1359434 2.986543
## [2,] 1.4707762 2.572882
## [3,] 1.1873543 2.867298
## [4,] 0.8675068 2.518522
## [5,] 1.0388861 3.042012
## [6,] 1.6357406 2.226513

We simply calculate the differences of the two b parameters:

delta.b = post$b[,2]-post$b[,1]

And now we can show e.g. mean and PIs:

c(mean = mean(delta.b),
  PI(delta.b,prob = c(.9)))
##     mean       5%      95% 
## 1.564541 0.716458 2.416746

And we can plot a histogram of the contrast:

hist(delta.b, xlim = range(c(0,delta.b)))
abline(v = 0, lty = 2)
abline(v = PI(delta.b,prob = c(.95)), col = "red")

precis(IQ.AxSES, depth = 2) %>% round(2)
##         mean   sd  5.5%  94.5%
## a[1]   90.52 0.78 89.28  91.76
## a[2]  100.37 0.76 99.15 101.58
## b[1]    1.32 0.36  0.74   1.90
## b[2]    2.89 0.37  2.29   3.49
## sigma   6.65 0.38  6.05   7.26
low.SES = which(SES < quantile(SES,.30))
high.SES = which(SES > quantile(SES,.70))

rbind(
  b.low.SES = b_A(SES[low.SES]) %>% mean(), 
  b.high.SES = b_A(SES[high.SES]) %>% mean())
##                [,1]
## b.low.SES  1.760083
## b.high.SES 3.275656